Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Keyword spotting on Korean document images by matching the keyword image

Identifieur interne : 001391 ( Main/Exploration ); précédent : 001390; suivant : 001392

Keyword spotting on Korean document images by matching the keyword image

Auteurs : SOO HYUNG KIM [Corée du Sud] ; SANG CHEOL PARK [Corée du Sud] ; CHANG BU JEONG [Corée du Sud] ; JI SOO KIM [Corée du Sud] ; HYUK RO PARK [Corée du Sud] ; GUEE SANG LEE [Corée du Sud]

Source :

RBID : Pascal:06-0063103

Descripteurs français

English descriptors

Abstract

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Keyword spotting on Korean document images by matching the keyword image</title>
<author>
<name sortKey="Soo Hyung Kim" sort="Soo Hyung Kim" uniqKey="Soo Hyung Kim" last="Soo Hyung Kim">SOO HYUNG KIM</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sang Cheol Park" sort="Sang Cheol Park" uniqKey="Sang Cheol Park" last="Sang Cheol Park">SANG CHEOL PARK</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chang Bu Jeong" sort="Chang Bu Jeong" uniqKey="Chang Bu Jeong" last="Chang Bu Jeong">CHANG BU JEONG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Department of Internet Software, Honam University, 59-1 Sebong-dong</s1>
<s2>Gwangsan-gu, Kwangju 506-714</s2>
<s3>KOR</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Gwangsan-gu, Kwangju 506-714</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ji Soo Kim" sort="Ji Soo Kim" uniqKey="Ji Soo Kim" last="Ji Soo Kim">JI SOO KIM</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hyuk Ro Park" sort="Hyuk Ro Park" uniqKey="Hyuk Ro Park" last="Hyuk Ro Park">HYUK RO PARK</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Guee Sang Lee" sort="Guee Sang Lee" uniqKey="Guee Sang Lee" last="Guee Sang Lee">GUEE SANG LEE</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">06-0063103</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 06-0063103 INIST</idno>
<idno type="RBID">Pascal:06-0063103</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000414</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000373</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000399</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Soo Hyung Kim:keyword:spotting:on</idno>
<idno type="wicri:Area/Main/Merge">001429</idno>
<idno type="wicri:Area/Main/Curation">001391</idno>
<idno type="wicri:Area/Main/Exploration">001391</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Keyword spotting on Korean document images by matching the keyword image</title>
<author>
<name sortKey="Soo Hyung Kim" sort="Soo Hyung Kim" uniqKey="Soo Hyung Kim" last="Soo Hyung Kim">SOO HYUNG KIM</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sang Cheol Park" sort="Sang Cheol Park" uniqKey="Sang Cheol Park" last="Sang Cheol Park">SANG CHEOL PARK</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chang Bu Jeong" sort="Chang Bu Jeong" uniqKey="Chang Bu Jeong" last="Chang Bu Jeong">CHANG BU JEONG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Department of Internet Software, Honam University, 59-1 Sebong-dong</s1>
<s2>Gwangsan-gu, Kwangju 506-714</s2>
<s3>KOR</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Gwangsan-gu, Kwangju 506-714</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ji Soo Kim" sort="Ji Soo Kim" uniqKey="Ji Soo Kim" last="Ji Soo Kim">JI SOO KIM</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hyuk Ro Park" sort="Hyuk Ro Park" uniqKey="Hyuk Ro Park" last="Hyuk Ro Park">HYUK RO PARK</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Guee Sang Lee" sort="Guee Sang Lee" uniqKey="Guee Sang Lee" last="Guee Sang Lee">GUEE SANG LEE</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science, Chonnam National University, 300 Yongbong-dong</s1>
<s2>Buk-gu, Kwangju 500-700</s2>
<s3>KOR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>Corée du Sud</country>
<wicri:noRegion>Buk-gu, Kwangju 500-700</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Database query</term>
<term>Document retrieval system</term>
<term>Electronic library</term>
<term>Feature extraction</term>
<term>Image matching</term>
<term>Image quality</term>
<term>Keyword</term>
<term>Korean</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Bibliothèque électronique</term>
<term>Appariement image</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance forme</term>
<term>Interrogation base donnée</term>
<term>Qualité image</term>
<term>Mot clé</term>
<term>Coréen</term>
<term>Système documentaire</term>
<term>Extraction caractéristique</term>
<term>Segmentation</term>
<term>Extraction forme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Système documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Corée du Sud</li>
</country>
</list>
<tree>
<country name="Corée du Sud">
<noRegion>
<name sortKey="Soo Hyung Kim" sort="Soo Hyung Kim" uniqKey="Soo Hyung Kim" last="Soo Hyung Kim">SOO HYUNG KIM</name>
</noRegion>
<name sortKey="Chang Bu Jeong" sort="Chang Bu Jeong" uniqKey="Chang Bu Jeong" last="Chang Bu Jeong">CHANG BU JEONG</name>
<name sortKey="Guee Sang Lee" sort="Guee Sang Lee" uniqKey="Guee Sang Lee" last="Guee Sang Lee">GUEE SANG LEE</name>
<name sortKey="Hyuk Ro Park" sort="Hyuk Ro Park" uniqKey="Hyuk Ro Park" last="Hyuk Ro Park">HYUK RO PARK</name>
<name sortKey="Ji Soo Kim" sort="Ji Soo Kim" uniqKey="Ji Soo Kim" last="Ji Soo Kim">JI SOO KIM</name>
<name sortKey="Sang Cheol Park" sort="Sang Cheol Park" uniqKey="Sang Cheol Park" last="Sang Cheol Park">SANG CHEOL PARK</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001391 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001391 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:06-0063103
   |texte=   Keyword spotting on Korean document images by matching the keyword image
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024